Week 4 BIG DATA TIME

In this class, we mainly learn how to collect, crawl and organize data, and learn how to use network plug-ins and software to obtain the desired content.

Chinese scholars are obsessed with calling the present the era of big data, and therefore equate the status of data with energy. In their view, the key to success in the information age lies in the development and use of data information. Influenced by this idea, I learned how to crawl data in my undergraduate studies, and even used Python programming to obtain all agricultural policies from government websites, including their issuing agencies, publishing time, original texts, and applicable scopes.For grassroots staff in China, they need to go to the countryside to talk to farmers, explain the latest policies to them, and guide them to take better actions. However, the network signal in remote areas is not very good, so they cannot open the web page directly. I hope that by crawling data, the staff can quickly collect useful information in their office, so as to facilitate their actions.

In this lesson, I learned that not all data is free or easily accessible. Website owners may use various technical or other methods to block data for personal or commercial gain. In the BBC movie review data crawling in this lesson, they did not directly display the content elements, and we could not even find its code, which caused trouble for the research. Later, I also encountered difficulties in crawling data from the Chinese Weibo platform. They frequently asked me to log in or verify to hinder my progress.

In fact, this is back to square one. Everyone realizes the importance of data, but they do not want to share it. Instead, they prefer to monopolize it to better serve themselves. Large platforms use users as digital laborers, allowing them to continuously produce data information, but use them as capital for their own development and utilization, and ultimately provide them with decision-making support or sell them for profit.

Italian Trulli